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ABSTRACT 

Direct imaging of exoplanets involves the extraction of very faint signals from 
highly noisy data sets, with noise that often exhibits signihcant spatial, spectral 
and temporal correlations. As a results, a large number of post-processing al¬ 
gorithms have been developed in order to optimally decorrelate the signal from 
the noise. In this paper, we explore four such closely related algorithms, all of 
which depend heavily on the calculation of covariances between large data sets of 
imaging data. We discuss the similarities and differences between these methods, 
and demonstrate how the use sequential calculation techniques can signihcantly 
improve their computational efficiencies. 

Subject headings: methods: analytical — methods: numerical — techniques: image 


processing 
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1. Introduction 


In the last twenty years, the existence of exoplanets (planets orbiting stars other 
than our own sun) has gone from conjecture to established fact. The accelerating rate of 
exoplanet discovery has generated a wealth of important new knowledge, and is due mainly 
to the development and maturation of a large number of technologies that drive a variety 
of planet detection methods. We have now conhrmed nearly o ne thousand exopla nets, 


with several thousand additional candidates already identihed flBatalha et ah 


201311 . The 


majority of these detections were made indirectly, by searching for the effects of a planet on 
its target star either via gravitational interaction as in Doppler spectroscopy surveys, or for 
direct blocking of starlight as in transit photometry. 


While indirect detection methods have proven very successful in discovering exoplanets, 
they are dependent on collecting multiple orbits of data and are thus biased towards 
short-period planets. Direct imaging, on the other hand, is biased towards planets on larger 
orbits, making it highly complementary to the indirect methods. Together, these techniques 
can signihcantly advance our understanding of planet formation and evolution at all orbital 
scales. Additionally, direct imaging provides the most straightforward way of getting planet 
spectra, which are invaluable to the study of planet ary and atmospheri c compositions and 


can serve as proxies for planet mass measurements (iBarman et ah 


20111). For these reasons. 


there is now a concentrated effort to develop direct exoplanet imaging capability, both for 
ground based telescopes and for future space observatories. At the same time, there are 
multiple groups working on the post-processing aspect of planetary imaging, and developing 
more advanced algorithms for extracting planetary signals from highly noisy backgrounds. 


Giant planets are typically millions of times fainter than their host stars, with the 
very brightest (and youngest) emitting approximately 10^ times less light than their parent 
stars in the infra-red. Earth-like planets reflect one part in 10 billion of light from their 
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host stars. Therefore, we require specially-designed, dedicated instrumentation to directly 
image planets. This is usually some form of coronagraph and wavefront control system, 
with many different iterations currently under development. While the instrument is 
designed to take care of both the diffraction and dynamic range problems that make 
exoplanet imaging so difficult, the hnal science images still contain some residual noise at 
approximately the level of the expected planet signal. This noise comes from a variety 
of sources, including imperfections in the instrument optics, non-common path errors in 
instruments with dedicated wavefront sensors, and (especially for ground-based instruments) 
uncorrected residuals from an adaptive optics (AO) system working to counter the effects 
of atmospheric turbulence. The different types of noise also have different spatial and 
temporal distributions - while detector readout noise and shot noise are completely random 
in space and time, noise from optical aberrations and AO residuals (referred to as speckles) 
will be correlated on the scale of the planet signal and will often persist through many 
subsequent images. 


Multiple post-processing schemes have been proposed to improve the odds of extracting 
a planet signal. In general, all of these attempt to model the point spread function (PSF) 
of the instrument, incorporating all static and quasi-static errors, and then subtract this (or 
decorrelate it) from the science image to reveal the residual planet signal. This template 
PSF is constructed from data taken by the same instrument, but in which no planet signal 
is present in the same spatial location. These data sets can either be historical libraries of 
other targets known not to have companions (or, at least, not to have companions that 
would appear in the same part of the image as the current target), or images of the same 
target star but with the planet signal appearing in different parts of the image plane. The 
latter can be accomplished in a variety of ways—by producing angularly differentiated 
data by turning off the telescope de-rotator on ground based instru ments (or spinning 


space-based observatories in between exposures) flMarois et ah 


20061) : or by simultaneously 
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imaging in multiple wavelengths, as with an Integra. 


or by imaging in multiple polarizations flStam et al 


field spectrograph flRacine et al 


1999|); 


20041) ■ All of these techniques produce 


data sets with spatially correlated noise, and decorrelated signal. Once a data set of this 
sort has been created, it is possible to model the underlying noise distribution and to 
generate a PSF template consistent with the data but not containing the planet signal we 
wish to extract. Figure U prese nts a sample data set of this type, showing simulated data 


for the Gemini Planet Imager flMacintosh et al. 


20121) . The first image shows a single 60 


second exposure with high shot and speckle noise. The second image shows the summation 
of one hour of such images, which reveal the static and quasi-static elements of the noise. 
The third image demonstrates the residuals after PSF subtraction, revealing the three 
planet signals in the original data set. 


In §21 we present a standardized notation and problem statement and briefly derive 
some of the most commonly used post-processing techniques, demonstrating the ubiquity 
of the image-to-image covariance calculation. In §3l we discuss how to most efficiently 
calculate the covariance and its inverse, using both sequential calculations and deriving a 
method to account for small changes in the reference image set. We conclude with a sample 
implementation highlighting the utility of these techniques. 


2. Signal Extraction 

2.1. Problem Statement and Notation 

We assume that we have a set of reference images containing random realizations 
drawn from some distribution of noise. We also have a target image containing both a noise 
component drawn from the same distribution, as well as the signal we wish to discover. 
Frequently, the reference and target images are drawn from the same data set, with the 
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target signal appearing in different spatial locations throughout the data. Given this set of 
reference images and the target image, we wish to construct an estimate that minimizes the 
distance from the reference set and maximizes the distance from the target image in the 
spatial location of the planet signal (thereby minimizing the noise). We write our set of n 
vectorized reference images of dimension p as where G and the (vectorized) 

target image as t. The ordering of the vectorization is arbitrary, save that it be applied 
in the same way to each image (i.e., the hnal vectors can be stacked image columns, or 
transposed, stacked image rows, or any other consistent method of reforming a 2D image of 
p elements into a column vector of dimension p). We will use an overbar to represent the 
vector-mean subtracted value of each vector: 

t = t - pit) ; r* = r* - /i(rj) , (1) 

where /i(x) = iY^^=i{^i))/P for a p-element vector x with components Xi,X 2 ,.. .Xp. We 
form a n X p matrix R whose rows are the transposed elements of the set {rj}: 

/?= [ri,r2,...,rj^ , (2) 

and the analgous row-mean subtracted matrix: 

R = [fi,r2,.. ■,rnf . (3) 

Therefore the (image-to-image) sample covariance of the reference set is given by: 

S^^RR^, (4) 

p — I 

where S has dimension n x n. The pixel to pixel covariance is thus: 

S' = -^ifR , (5) 

n — 1 


and has dimension p x p. 
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2.2. Locally Optimal Combination of Images 


One method of solving the stated problem is to generate a least-squares £t to the 
target image via a linear combination of the ref erence set. The hrst im plementation of this 
method for exoplanet imaging was described in 


Lafreniere et ah 


fl2007l) as Locally Optimal 


Combination of Images (LOCI). Written in the formalism of ^2.11 this approach requires 
hnding the n-dimensional vector of optimal coefficients, c, such that: 


c = argmin t — IV c\ 


( 6 ) 


where the norm is typically 

The LOCI procedure is analogous to solving the overdetermined linear system: 


Wc = t 


( 7 ) 


As there are typically more pixels than independent images (i.e., p > n), a unique solution 
will not exist for Equation ([7]). However, when RR^ is full rank, the left-pseudo-inverse of 
R^ gives the minimum least-squares error solution: 

c= {RR'^y^Rt, (8) 

which satishes Equation (|6]) for the norm. The case where RR^ is not directly invertible 
is treated in the next section. 

The estimated signal is the subtraction of the linear combination of references from the 
target image: 

s = t- R^c = {I - R^{RR'^)-^R)t (9) 

where I is the identity matrix. It is important to remember that Equation (|H]) is not an 
exact solution for Equation ([7]), but rather a solution to Equation ([6]). If an exact solution 
existed, this would imply that the signal is in the image of the reference set—that is, a 











linear combination of the noise—and we would get a zero signal estimate, making this 
method inappropriate for this task. 


We can also perform the same least-squares htting to the target image using the 
zero-mean reference set {R) rather than the original reference set {R). All of the steps in 
Equations ([H])-® remain the same, and our signal estimate is now: 


The signal estimates in Equations (jl]) and flTOj) are not equal. In particular. Equation 
ffTOj) generates a zero-mean estimate (i.e., /i(s) = 0) whereas Equation ® generates a 
vector with mean proportional to the difference between the sample means of the target 
and references and the underlying distribution mean. The operator in Equation fllUp is also 
approximately mean and norm preserving, meaning that if it is applied to t rather than t, 
the resulting signal estimate would have approximately the same mean as the target image, 
with the variance of the signal estimate for the mean-subtracted target image. From the 
dehnition of the covariance, we see that S can be written as: 


A = 


p — 1 


{RR^ - 


( 11 ) 


where fiR is the vector of row means of R: 

= ( 12 ) 

and Ipyi is the p element column vector of ones. By the Sherman-Morrison formula 


( Sherman fc Morriso: 


19501 1. this means that 
^-1 = {p-l) \{RR^')-^+p 




and 


(RR 


.T\-l 


p — 1 


s 


-1 p 


1 - pr^r{RR^) Vr J 

S-^piRPilS-^ 


p -1 11 + Vr 


(13) 


(14) 














9 


The second term in Eqnation ffTT|) scales as p/(p — 1)^ and so goes to zero in the limit as 
p —)■ oo. For hnite matrix sizes, the difference between S~^ and {p — 1){RR'^)~^ is a small 
valne proportional to the difference between the calcnlated sample means and the true 
means of the underlying distribution from which the references and target are sampled. 
The zero-mean properties of Equation flTOl) make it easier to extract unbiased estimates of 
the scene, so we will use this form going forward. 


2.3. Covariance Pseudoinverses 


Of course, S is only guaranteed to be positive semi-definite and is therefore 
not necessarily invertible {S is only positive dehnite when all rows of R are linearly 
independent). In these cases we can use pseudoinverses of the covariance to calculate the 


signa. 
as in 


estimates. One op tion is to use singular value decomposition (SVD) based inversion 


Marois et al 


(12010^ . which describes this technique as part of the Speckle-Optimize 


Subtraction for Imaging Exoplanets (SOSIE) pipeline. Real matrix S can be decomposed as 


S = UI:V ^, 


(15) 


where S is a positive semi-definite diagonal matrix of singular values and U and V are 
unitary. Because S is square, all of these matrices will be square and of the same dimensions 
as S. The pseudoinverse of S can then be expressed as: 

S+ = VE+U^, (16) 

where is the pseudoinverse of S—non-zero entries of S are replaced by their reciprocals 
while zero (or small) values are left as zeros. Assuming that the diagonal of S is ordered 
by decreasing magnitude (these values can always be reordered as the columns of U and V 
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form orthogonal bases for the left and right singular vectors of S), this can be expressed as: 

IpxP Opx(n-P) 

Zj ^ 

0(n-P)xP 0{n-P)x{n-P) 

where P is the number of singular values (diagonals of S) that are greater than a desired 
tolerance, e, and where O^xn represents an m x n matrix of zeroes and Inxn represents an 
n X n identity matrix. Equation ffTOj) thus becomes: 

s = t. (18) 

Alternatively, we can use eigendecomposition, which, in this case, is mathematically 
equivalent to SVD. S is Hermitian (symmetric in the strictly real case) and therefore is 
diagonizable as: 

5d> = <hA, (19) 

where $ is the unitary n x n matrix whose columns are the eigenvectors of S and form an 
orthonormal basis, and A is a diagonal n x n matrix whose entries are the corresponding 
eigenvalues. We assume that $ and A are ordered such that the eigenvalues decrease from 
largest to smallest along the diagonal of A. Thus, 

= <|.A-^<1)^ , (20) 

where the pseudoinverse of A may be used in cases of small or zero eigenvalues to find the 
pseudoinverse of S. This is calculated in the same manner as the pseudoinverse of S in 
Equation (ITTll . 


^-1 


( 17 ) 


2.4. Karhunen-Loeve Image Processing 

While the previous section describes viable methods for regularizing the covariance 
and achieving an inverse calculation, there is another possible approach: Rather than using 
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a pseudoinverse, we can project the target signal onto a subset of an optimally 


compacting basis using the Karhunen-Loeve (KL) theorem, as in 
To do so, we dehne: 


e nergy 


Sonmmer et al 


(120 12h 




( 21 ) 


where Z is the n x p matrix of KL transform vectors and $ is the matrix dehned by 
Equation ([19]). 


From our earlier dehnition of the reference set, we know that the matrix ZZ^ will 
be positive semi-dehnite in the general case (and positive-dehnite when all elements of 
the reference set are lin early independent), and thus has a unique principal square root 


([Horn fc .Tohnson 


2012 ). Using the shorthand B = y/A 


BB = A for the matrix 


square root, we can write: 


\/ZZT = 


( 22 ) 


Dehning the diagonal matrix G (where again the pseudo-inverse of A can be used in 
cases of zero eigenvalues of S) as: 

G = (23) 

VP - 1 

we can write the zero row mean, normalized version of Z as: 

Z = GZ. (24) 


The matrix Z is our linear tra nsform operator and is a decorrellating, optimally compacting 


basis for R flRao fc Yipl 1200(1 ). In order to drop any zero eigenvalues in the case where the 


reference set elements are not linearly independent, and to avoid overhtting the target, Z is 
truncated to k rows: 

Zk = hZ , ( 25 ) 
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where Ik is the n x n identity matrix truncated to the hrst k rows (hnal dimension k x n): 


h = 


kkxk Ofcx(n—fc) 


(26) 


To reconstruct the target image t we project it onto Zk'. 


t = 


(27) 


and, as before, recover the signal by subtracting this from the original image: 


s = {I -ZlZt)t. 


(28) 


This is equivalent to: 


s= / 


1 t 


p — 1 


(29) 


where F is the matrix formed by padding the transpose of Ik with n — k zero columns; 


F = 


^kxk ^kx{n—k) 

0(n—fc)x/c 0(n—/c) X (?!—/;:) 


(30) 


This procedure has been titled Karhunen-Loeve Image Processing (KLIP) (jSoummer et ah 
20121 1 ■ 


2.5. Hotelling Observer 


Finally, we have the case of the Hotelling Observer (iBarrett et ah 


2006; 


Caucci et ah 


20071 ). the optimal linear discriminant whose test statistic is calculated via the inverse of 


the full data covariance. For a single image set, in our notation, this statistic would be: 


((S')-‘§)^t 


(31) 
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where S' is the pixel to pixel covariance defined in Eqnation (|5]) (c.f., ICaucci et al.l (120071) 
Eqnations 16-17). 

As the dimensionality of the fnll data covariance is mnch larger than that of the 
image-to-image covariance for typical data sets, a direct inversion may be m ore difficnlt, 


and s ignificantly more data is reqnired for the matrix to be well conditioned flLawson et ah 


2012|). This has led to mnltiple proposed decompositions for the data covariance, with some 
factors estimated via simnlated data, and certain simplifying assnmptions in clnding exact 


know’ 


edge of the backgronnd and fnll statistical knowledge of the PSF as in 


Cancci et ah 


(120071) . Implementations of these calcnlations have be en snccessfnl 


specialized high-performance compnting environments (ICancci et al 


demonstrated on 


20091). 


3. Sequential and Neighboring Calculations 


All of the techniques described in the previous section make heavy use of the covariance 
of the reference data set. For the LOCI and KLIP-like algorithms, it is often necessary to 
calculate hundreds of covariance matrices of reference sets containing many of the same 
images. This is especially true when using data sets derived from angular or spectral 
diversity, where the reference set for each individual image is the remainder of the data set, 
minus a small number of images where the planet signal would occur in the same general 
location as in the target image. For Hotelling observers and library-based templates we 
wish to calculate large covariance matrices possibly including all of the images taken by 
an instrument to date, which can be a very time and memory intensive operation. Finally, 
when applying region-based implementations of KLIP and LOCI we may wish to make 


sma. 

and 


1 modifications to the regions used to optimize the processes (see 


Marois et ah 


Puevo et ah 


(l2ninh 


(120121) for detailed discussions on LOCI optimization zones). In each case. 


we can greatly improve the efficiency of our calculations by replacing batch and redundant 
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processes with sequential ones (see 
processing techniques). 


Stengell fll994l) for a general discussion on sequential 


3.1. Mean and Covariance Update 


As a hrst step in developing the tools specihc to our application, we will outline the 
sequential calculation methods for hnding the sample mean and covariance of a data set. 
Given a set of vectors {xj}” we dehne the sample mean as 


1 


= - y] Xi, 


i=l 


and sample covariance as 


S =-- ^(xi - ^)(xi - . 

r) — ( ^ 


2=1 


n — 1 

Expanding the summation in Equation ([33]), we have 

1 




n — 1 
1 

n — 1 
1 

n — 1 
1 

n — 1 


^ [Xixf - fixf - 

2 = 1 

n n n 

^ Xixf - /X ^ xf - ^ 

2=1 2=1 2 = 1 

n 

Xjxf — 

2=1 
n 

E T 7 

XiXi 


2=1 


(32) 


(33) 


(34) 


where the penultimate step is due to the dehnition of the mean from Equation 


Now, let the mean and covariance at time k be denoted by /x^ and S^. Then: 

k-l 


(/C- = '^^i 

2=1 
k 


(35) 


2=1 


(36) 
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so that 


k k—l 

k^^k -{k- ^ Xi - ^ X* = Xfc 

i=l i=\ 




(fc - 1)/Xfc_i +Xfc 

k 


(37) 

(38) 


Similarly, 


k-l 


{k - 2)Sfc_i = -{k- 

i=l 

k 

(A; - l)Sfc = ^ Xixf - k^lf^^ll , 


1=1 


SO 


k-l 


2=1 


2=1 


Substituting with the expression derived from Equation (EH]), this becomes 

'kHk-^k\ fkiil-y^l 


{k - l)Sfc = {k- 2)Sfc_i + XfcX^ - k^lf,^ll +{k-l) 

k 


g ^ ^j_ 

bfc ^ ^ (A; - 1)^ 


A;- 1 

[(xfc-Mfc)(xfc-/^fc)T • 


A:-l 


(39) 

(40) 


{k - l)Sfc - (A; - 2)Sfc_i = Y ~ + {k- ■ (41) 


(42) 

(43) 


Alternatively, from Equation fE3|l . we can write 

fc-i 

{k - l)Sfc = ^(Xi - /X;.)(xi - ^;.)'^ + (Xfc - /Xfc)(xfc - (44) 

2=1 

so the final term in Equation fl45D becomes 

k-l 

(xfc - /Xfc)(xfc - = {k- l)Sk - ^(Xi - Mfc)(xi - Mfc)^ • 

2=1 


(45) 
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Substituting Equation fl38|l for the summation in the above equation becomes 

fc-i 

i=l 


fc-i , fc-i fc-i , fc-i fc-i 

fh -L ^ ^ rji X ^ ^ rp rh 1. ^ ^ m X ^ ^ ’ 

Mt-i 2^ X, - -Xt Xj - —— x,Mt-i - T 2^ XiX; 


E 


T 

XvX, — 


2=1 2=1 2=1 2=1 2=1 
/c-1 


E 


2=1 *- 


k 

{k-lf 


p- lik-ili-i + + IplxtMLi + pXixJ 


(46) 


fc-1 


- (^ - 1) ) (/^fc-ixl + Xfc/Xfc_i - XfcX^) 


2=1 


A)2 


= {k - 2)Sk-i + -p- - Mfc-iXfc - Xfc/xJ_i + XfcX^) , 

where we used Equation (13^ in the final step. Returning to Equation (05]), we can now 
write 


= (*:-l)Ss-(;;-2)St_i -p- {Mi-tMt-i “ + XixP . 

(47) 

Substituting this back into Equation (031) yields 

Sfc = ^ [(xfc - Rfc_i)(xfc - fik-if] ■ (48) 

Both versions of the covariance update (Equation (031) and Equation (081) ) may be 
written as 

Sfc = aSk-i + /^VfcV^ , (49) 


with 


a 13 Vfc 

kil k _ 

k-l (A:-l)2 Xfc ^J^k 

l/k 


k-2 

k-l 
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3.2. Square Root and Inverse Updates 

In several applications in ^ we are interested in the inverse of the covariance more so 
than the covariance itself. Fortnnately, there are known, simple matrix decompositions that 
allow us to update the inverse covariance directly with new sample vect ors, rather than 


updating the covariance and recalculating the inverse at each step. As in 


Igel et ah 


fcood), 


it can be shown that the update of the Cholesky decomposition of the covariance, 

S = LL^ 

where L is a lower triangular matrix with positive diagonal values, can be written as 

Lfc = \/«Lfc_i + 


(50) 


a 




where is the vector defined implicitly by 


1 + -||Zfc||2 - 1 I VfcZ^, 

a 


(51) 


Vfc — Lfc_iZfc 


(52) 


so that 

llzfcf = vlL^^iL-^iVfc, (53) 

where ()“^ represents the inverse-transpose. Thus, rather than updating the full covariance, 
we can update its square root (in the Cholesky sense), which is a potentially more useful 
quantity. However, this approach still requires that we calculate the inverse of L at each 
time step, whereas we wish to update the inverse covariance, or some decomposition of it, 
instead. 


By the Sherman-Morrison formula (jSherman fc Morrison! Il950[) 


c-i 

0-1 _ fc-i 
- 


a 


^k-l [ T \ Tc-l 




a 




(54) 


where I is the identity matrix. Note that in this formulation, we never need to invert 
VfcvJ itself, avoiding any ill-conditioning problems. However, in numerical applications, it 
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is important to always evaluate first, before any other matrix multiplications, so as 

to guarantee that numerical errors do not corrupt the Hermitian property of the resulting 
matrix. Substituting 

S-^ = , 

Equation fl5^ becomes 


(55) 


T -Tt -1 _ f T , -T T -1 ^ ^Tt -T T -1 

^k ^k ~ ^ ^ \^'^^k^k^k-l^k-l] ^^k^k^k-l^k-l 

a a \ a a 


T 

^k-i 


-1 


I - ^^k-i M 


Lfc-i 


(56) 


Again by the Sherman-Morrison formula, the bracketed term above can be rewritten as 

I - = (^ + • (57) 

a \ a / \ a / 


Dehning matrix T and its Cholesky decomposition, M, as: 

Tfc^J + ^L-VfcvrL-^i and , 

Equation fl56ll becomes 


a 


(58) 


(59) 


SO the update of the inverse Cholesky factor is simply 


U‘ = , (60) 

Note that many of the inverses in the expressions above need not be directly calculated, 
but can be replaced with the equiva lent, specialized LAPACK routines for solving systems 


of linear equations flAnderson 


1999I) . All terms of the form A are the solution of the 


linear system AX = B, and have multiple dedicated solvers, the choice of which depends 
on the specific form of A and B. 
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3.3. Initialization 


In cases where n is smaller than the dimension of x (as in the initial stages of collecting 
a large data set) , the covariance is not full rank and so the inverse covariance is undefined. 
Furthermore, the covariance in these cases is not guaranteed to be positive dehnite so that 
the Cholesky factor will also be undefined. Even if the final number of samples is greater 
than the size of each sample vector, this condition will persist in the initialization and early 
updates of the covariance. 


To address this , we can use an indefinite decomposition (closely related to the Cholesky 


decomposition - see 


WatkinsI (120041) for details), such as 


S = ADA^ 


(61) 


where A is again a lower triangular matrix, while D is a diagonal matrix. Then, Equation 
fl49|) becomes 

AfcDfcA^ = Q;Afc_iDfc_iA^_j^ + /?Afc_iZfcZ^A^_^, (62) 


where z^ is again dehned via 




Therefore, the factors of this decomposition may be updated as 


(63) 


Afc — \/aAfc_iMfc (64) 

Dfc = Gfc , (65) 


where 

MfcGfcMf = Dfc_i + /dzfczj . (66) 

With this definition, the inverse of A^ can always be found, even when S*, is singular. As 
the update progresses to a point where is positive definite, the diagonal elements of 
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will all become positive. At this point, we can convert to the Cholesky factor as 

L = A\/D. (67) 


3.4. Cross-Covariance 


In several of these applications, we will want to also produce covariances conditioned 
on some other set of available information not included in the images themselves. For 
example, we may w ant to accoun t for a tmospheric conditions, or instrument operating 
conditions, etc. (see 


Caucci et ah 


(120091) for details). This conditioning is provided by the 


calculation of the cross-covariance, which can also be updated sequentially. Given a second 
set of vectors {yi}i, the cross-covariance with {x*}" is 


’“’S =-- V(x. - »(y. - ypif 

T) — 1 


n — 1 


( 68 ) 


2=1 


where and are now the means of the two sets, respectively. As before, it can be shown 
that 


= 


n — 1 




and 


We now have 




2=1 


k 


k-l ' (A;-l)2 


[(xfc -X)(yfc 


(69) 


(70) 


(71) 


with all values defined as before, and = Yk — or Yk — Again defining the 

Cholesky factor of as 


xyg _ xy^xy^T 


(72) 
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the Cholesky factor updates are now 

■“Li = (73) 

(74) 

where 

= M,M[ . (76) 


3.5. Neighboring Covariances 


Finally, there is the case of overlapping subsets of vectors for which we wish to 
calculate covariances. In the cases of KLIP and LOCI, we will frequently wish to update the 
covariance with one or more new reference images, but also to remove one or more images 
from the reference set. A concrete illustration of this is the case of applying the algorithm 
to an angular diversity data set, where the noise in each image remains nearly static, while 
the planet signal moves about the center of rotation. The reference set for each image in 
the data set is the subset of images where the planet signal is a minimum angular distance 
away from its location in the target image. Thus, reference sets will be highly overlapping, 
with a relatively small number of images added or dropped in each subsequent reference 
set. This is equivalent to calculating the covariance for a matrix R with one or more rows 
removed. 


As demonstrated in 


Lafreniere et al. 


(120071) and elsewhere, removing rows from the R 


matrix is equivalent to simply removing rows and columns from the S matrix calculated 
from the original R. For example, we can represent the removal of the ith row of R to form 
the truncated matrix R^i as R-i = HR, where H is a block diagonal identity matrix with 
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all zeros in the ith column: 




(76) 


0(j—l)xl 0((i—1) X (n—*) 

0(n—i)x(i—1) 0(n—j)xl ^(n—i) X (n—i) 

with Omxn representing an m x n matrix of zeroes and Inxn the n x n identity matrix. The 
associated covariance would then equal: 




HRR^H^ , 


p — 1 


(77) 


which is simply the removal of the ith row and column from S. 


A less trivial case occurs when we wish to calculate the covariance of R having added 
or removed one or more columns - equivalent to adding and subtracting pixel locations in 


the image. 


Maroisetah 


his ca n be useful when you are working on optimizing zones in LOCI (see 
f 201ol for discussion) or applying KLIP in varying or overlapping annular 


regions. It is also applicable to similar optimizations that can be attempted for the 
Hotelling observer, and can be used together with the sequential calculation of the Hotelling 
covariance described above. 


In order to describe this situation, we now define two non-intersecting sets of vectors 
{xj}” and {yi}™ with sample means fj,y and sample covariances Sx, Sy, respectively. 
Returning to the definition in Equation fl32|) we see that the total sample mean of the union 
of the two sets (represented by x U y) is: 


1 


(u^^ + m/Xy) . 


(78) 
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Similarly, from Equation 
1 


S. = 


Sy = 


SxUy — 


n — 1 

z^- 


2=1 

1 

1 

m — 1 

_ 2=1 

1 



we can write: 


; Xixf - 


YtyJ - 

n m 

Y + Y -{.n + m)^^+y/x; 


n + m — 1 

. 2=1 2=1 
Substituting the first two equations into the third yields: 
1 


T 

xUy 


(79) 

(80) 
(81) 


SxUy — 


n + m — 1 
1 

n + m — 1 


(n - l)Sx + (m - l)Sy + + m^i ^ - 


n + m 


[n^i^ + m/Xy) + m/Xy)^ 


(n - l)Sx + (m - l)Sy + 


nm 
n + m 


(Mx - My) (Mx - My)‘ 


(82) 


Rewriting Equation (IHTj) in a slightly different way, and substituting in Equation fl78|) we 
get: 

+ l)Sx+y - (n - l)Sx - - m^y/Xy + (n + m)^x+y^^+y] 

{n + m- l)Sx+y - (n - l)Sx - ^ - M^uy) (Rx “ Rxuy)^ 


m — 1 


m 


(83) 

Equations (15^ and fl55]) give us the ability to quickly update a covariance by adding and 
removing elements from the reference set without recalculating the entire covariance matrix, 
leading a signihcantly smaller number of operations, especially when the total data set is 
signihcantly larger than the number of images added and dropped for each reference set. 

In the case where only one vector is being added and removed at each iteration, we can 
write a single set of update equations: 

1 


Rfc+i ~ Rfc + 


Sfc+i — Sfc + 


N-1 

N-1 


(iV-2)2 L 


[Xfc_i - Xfc] 

(xi-l - llt+i) (xy_i - Mt-l) - (xt - Mi) (Xi - l^tf 


(84) 

(85) 
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where a set of N sample vectors is taken — 1 at a time, with vector dropped and 
vector Xfc_i added at each step. Thns, Sk and are always the covariance and mean of the 
snbset of the sample vector set {xj}^;^. 

Fignre [2] demonstrates the ntility of these eqnations by comparing the execntion time 
of two programs (rnnning in the same environment), which calcnlate all of the covariances 
of snbsets of sample vectors drawn from an increasing data set. Note that both axes of 
this hgure use logarithmic scales. The hrst program, represented by the dashed curve, 
calculates all covariances from scratch, and has geometrically increasing execution time. 
The second program, represented by the solid curve, uses equations flS^ and fl55]l to update 
covariances and has strictly linearly increasing execution time. All times are normalized by 
the minimum execution time of the second program. It is clear that in applications where 
we must continuously evaluate covariances of closely related data sets, this approach can 
lead to signihcant decreases in computation time. Figure [3] plots the maximum differences 
between covariances calculated by the two codes, which increase with the number of sample 
vectors, but remain within a factor of 10 of the data type precision. 


4. Conclusions 

We have presented a group of techniques that can signihcantly improve the efficiency 
of calculations associated with some of the most frequently employed post-processing 
techniques for planetary imaging. In particular, the introduction of sequential and 
neighboring covariance calculations can turn highly intensive calculations into relatively 
simple processes that can be run on conventional hardware in real time. The increased 
computational speed has the additional benefit of allowing the user to more freely vary 
other parameters in the computation, which can be very important for algorithms such as 
LOCI where there are many additional inputs in the form of the selected subtraction and 
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optimization zones. 

Many thanks to the attendees of the 2012 Exoplanet Imaging Workshop, and especially 
Lisa Poyneer, Harry Barret, Luca Caucci, Laurent Pueyo, Remi Soummer, Christian Marois 
and Dimitri Mawet for many useful discussions and patient explanations. Thanks also 
to Bruce Macintosh for his very helpful comments and suggestions and to Jean-Baptiste 
Ruffio for his feedback. Portions of this work were performed under the auspices of the 
U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract 
DE-AC52-07NA27344. 
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Fig. 1.— Left: Simulated 60 second single exposure using the Gemini Planet Imager instru¬ 
ment. The bright spots are astrometric calibration spots generated by the instrument. The 
bright lobes in the dark region are due to atmospheric turbulence and point along the ma¬ 
jor wind direction. Center: One hour of simulated data comprised of 60 second exposures, 
derotated and summed. One planet is clearly visible with a second barely detectable. Right: 
PSF subtracted, summed version of the data set. An additional planet becomes visible. 
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Fig. 2.— Normalized execution time of the calculation of the covariance matrices of all 
subsets of iV — 2, 10 pixel, sample vectors from a set of N vectors, averaged over 100 
executions. The dashed curve represents the (geometrically increasing) execution time of 
code which calculates each covariance from scratch, whereas the solid curve represents the 
(linearly increasing) execution time of code which uses update equations (l8^ and (|83il . 
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Fig. 3.— Maximum difference between covariance values calculated by the two programs in 
Fig. [2j The dashed line represents the precision of the data type used. 
























































































































































































